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Abstract 

A theory input is needed for the estimation of the largest background for the H — > 
WW search at the LHC. This is the shape of the mg£ spectrum from continuum WW 
production. We find this to depend on how NLO matrix elements are merged with parton 
showers, and we compare the results from a number of different implementations. The 
results suggest that both the size of the background estimate and its uncertainty may 
have been underestimated. This conclusion is reinforced by a "Note Added", which 
comments on the LHC results released on November 14 2012. 

The Higgs to WW search channel at the LHC suffers from substantial backgrounds, with 
the dominant one being continuum WW production. This background in the signal region is 
estimated by counting events in a control region, subtracting off other backgrounds to deduce 
the number of continuum WW events in the control region, and then extrapolating that 
number to the signal region. This extrapolation requires theory input, namely the ratio of 
WW cross sections in the signal and control regions. This ratio is labelled a in pQ and it is 
obtained from a particular NLO Monte Carlo simulation in the case of ATLAS [2] and from a 
unspecified simulation in the case of CMS [3]. Since more details are given by ATLAS we shall 
focus on their analysis of the decay mode WW — > ev^iv. We shall also focus on the 0-jet bin 
and on the ratio a . In this bin the WW background is about 70% of the total background in 
the signal region. A 125 GeV Higgs signal is about 14% of the WW background (here NNLO 
effects are included in the Higgs signal estimation), and thus theoretical uncertainties in ao 
of similar size can have a relatively large impact on the analysis. 

In terms of a fixed order parton level description ao can be obtained at NLO. A calcu- 
lation of cto at NNLO may eventually become available since a complete NNLO calculation 
of diphoton production has been performed [!] and partial NNLO results for WZ production 
have also appeared [5]. In these examples the NNLO corrections are large, but currently the 
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potential impact of such corrections for the H — > WW background have not been considered. 
For the simulation of background at the detector level fully showered and hadronized events 
are required, and so at the very least parton showers need to be merged with the NLO event 
generation. As we shall explore, the introduction of parton showering tends to increase ao 
from the NLO result, and as well the different implementations of the parton shower give a 
range of values. The parton shower includes some effects that would be included at NNLO, 
and thus these results can also give some idea of the possible size of higher order corrections. 

The value of «o used in the ATLAS analysis can be obtained from Table 2 of [2] as follows]^] 
The listed number WW of events in the signal region after cuts (234) includes a correction 
factor. This factor when applied to the listed number of WW events in the control region 
(531) is designed to push the total number of events in the control region (789) from all 
backgrounds up to the observed value (820). Thus a = 234/(531 + 820 - 789) = 0.416. We 
emphasize that ao is not determined from data, that a Monte Carlo simulation is needed to 
obtain it, and that it determines the overall number of WW background events in the signal 
region given measurements in the control region. 

We shall define another quantity a' Q which is slightly different from a^. It has some 
advantages: 1) it is a property of a single distribution, the m« distribution in the presence of 
certain cuts, 2) it does not depend on Monte Carlo modelling of the tail of this distribution at 
arbitrarily high m«, 3) it can be obtained from [2] without ambiguity, namely from Fig. (14b). 
We define a' as the ratio of the number of continuum WW events in the 10 < m« < 50 GeV 
region over the number in the 80 < m« < 290 GeV region with the cuts that were used to 
obtain Fig. (14b). 290 GeV is the maximum m« in Fig. (14b). By discretizing the figure and 
pulling out the WW contribution we obtain a' Q = 0.457. a' is 10% larger than ao and this 
difference is due to 1) not including the A0« < 1.8 cut on the signal region, and 2) omitting 
the contribution from m« > 290 GeV in the control region. Of the 10% difference, from Table 
2 in [2j we see that the first effect gives 4%, and so the m« > 290 GeV tail is a 6% effect 
according to ATLAS. 

For the WW background ATLAS [2] uses MC@NLO and Herwig6 for event generation fol- 
lowed by a full detector simulation. We shall obtain a' from a number of different NLO Monte 
Carlo tools, but without the detector simulation. The range of these values will translate into 
a theoretical uncertainty in the final estimate of the WW background. 

Our representation of the ATLAS cuts is as follows. 

• a e^/x^ pair, with pt thresholds of 25 and 15 GeV for the two leptons 

• \r)\ < 2.5 for leptons 

• AR > 0.3 between leptons 
Pierre Savard, private communication. 
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• no jet with pt > 25 GeV (anti-fcr algorithm with R = 0.4) 

• £™ ss sin(min(A0,7r/2)) > 25 GeV 

A(p is the minimum azimuthal angle between E™ 1SS and either of the two leptons. With these 
cuts our values of a ; Q are obtained as the ratio of cross sections in the 10 < < 50 GeV 
and 80 < m« < 290 GeV regions. Sometimes we will replace the jet veto with another nearly 
equivalent cut described below. AR > 0.3 is only included because it is effectively implied by 
lepton isolation requirements, and if omitted a' would only increase very slightly. 

We apply these cuts at the generator level while ATLAS applies these cuts at the detector 
level to obtain their Fig. (14b). At the detector level lepton isolation requirements are imposed 
and many other factors come into play, such as jet scale uncertainties, the modelling of pile- 
up, triggering efficiencies, etc. We shall deduce that detector level effects cause a substantial 
decrease in a' Q , or in other words a significant distortion of the mm distribution. The large size 
of the detector level effects suggest that their modelling could be another source of uncertainty. 

Before considering the m« distribution we first consider another leptonic distribution which 
is especially sensitive to the introduction of the parton shower. The pr of the WW system 
is obviously sensitive to the partonic and gluonic degrees of freedom against which it recoils. 
pWW < 25 GeV is the relevant range for the 0-jet bin and we display examples of this distri- 
bution in the presence of the cuts in Fig. (1). The POWHEG-BOX [6 J package allows direct 
comparison of the fixed order NLO result and the LHE result (from the Les Houches event 
file before showering) with the showered results using Pythia6 [7] and Herwig6 [8]. We also 
include the result from MC@NLO (4.09) using Herwig6 for showering. The POWHEG 
LHE events include a (N)LL resummation of soft gluon effects. We see that this is the key 
feature of the parton shower which removes the infrared singularity of the fixed order NLO 
result and completely changes the p^ w spectrum in the 0-jet bin. 

MC@NLO and POWHEG generated events can also be showered through the newer pro- 
grams Herwig++ (2.6.1) [TD] and Pythia8 (8.17) [UJ respectively. Another independent re- 
sult can be obtained from Herwig++ which has its own internal implementation [12] of a 
POWHEG-style NLO event generator. We label this tool Herwig+- |-@NLO. We also include 
a result from a beta version of aMC@NLO [13] available in the MadGraph 5 framework [T4"] . 
To implement a common analysis for these newer tools, and for Sherpa (1.4.2) [TS] as well, 
we make use of their ability to generate HepMC [16] format event files. This is in contrast to 
the tools mentioned in the previous paragraph, where we instead use their built-in analysis 
routines suitably modified to include the cuts. We also use the strict NLO parton level tools, 
MCFM (6.3) PI] and VBFNLO (2.6.0) [IE], with their built-in analysis routines. 

When available we feed the HepMC event files through the Delphes fast detector simulator 
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Figure 1: The p^ w distribution. The NLO result has a singularity structure at pt = such 
that the actual value for the lowest bin is the negative of the value shown. The NLO values 
are also divided by 14. 



[19] |^] This implements jet finding based on Fast Jet [20], where we select the anti-A;^ algorithm 
with R = 0.4. Delphes produces LHCO files containing the reconstructed objects, and it is 
at this stage that we apply the cuts listed above via MadAnalysis [21]. We turn off lepton 
isolation requirements and the trigger in Delphes to minimize detector effects. The tracking 
efficiency is found to have no impact on a' and we set it to 100%. The remaining aspects of 
detector simulation that Delphes still implements are found to have very little impact on the 
value of ckq. 

We thus obtain the mu. distributions with the cuts imposed, and from these we obtain 
the values for a' listed in Table 1. We have used Sherpa to obtain the lowest order result 
for a' Q . We also use Sherpa to introduce a parton shower through its CKKW style merging 
of the shower with matrix elements involving and 1 additional partons. Sherpa in addition 
implements an internal POWHEG-style NLO event generator, but currently it is unable to 
produce unweighted events for NLO processes and so this precludes an NLO entry from Sherpa. 

2 Dclphes is slightly modified to properly handle the negative weights in the HepMC events produced by 
MC@NLO and aMC@NLO. 
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ATLAS 


0.447 


Sherpa-LO 


0.54 


Sherpa-LO-shower 


0.58 


MCFM 


0.555 # 


VBFNLO 


0.555 # 


POWHEG-NLO 


0.55 


POWHEG-LHE 


0.57 


MC@NLO-Herwig6 


0.55 


POWHEG-Pythia6 


0.59 


POWHEG-Herwig6 


0.60 


aMC@NLO-Herwig++ 


0.545 


MC@NLO-Herwig++ 


0.55 


POWHEG-Pythia8 


0.59 


Herwig++@NLO 


0.625 



Table 1: Values of a' . A number of runs are compared in each case to arrive at standard error 
estimates less than ±0.005. The ATLAS number includes the effect of detector simulation 
while the other numbers do not. *These numbers include a 3% contribution from gg — > WW. 

As we have mentioned the similar quantity ct was introduced and studied in pQ. No 
numerical values of were given but it was noted that MCFM and MC@NLO-Herwig6 
produced values in good agreement. The suggestion from this reference is that the theoretical 
errors on a® are small. We also see the agreement between MCFM and MC@NLO-Herwig6, 
but the other results in Table 1 suggest a different conclusion regarding theoretical errors. 

Both MCFM and VBFNLO implement the gg — > WW process which proceeds through a 
quark loop. Although strictly of higher order it is numerically relevant due to enhancement 
from the gluon PDFs. ATLAS also incorporates this process but the other event generators 
in Table 1 do not implement it. VBFNLO allows a' to be obtained with and without this 
contribution, and it is found that the gg — > WW process increases a' Q by about 3%. Thus to 
include this effect the results from the other generators should be increased by 3%. 

MC@NLO-Herwig6 is the event generator used by ATLAS [2] (but see Note Added), and 
we see that the value it gives for a' Q is interesting for two reasons. This a' Q happens to be at 
the low end of the NLO+shower estimates, and so in this sense the ATLAS estimate of the 
WW background could be on the low side. And second this value for a' is about (20 + 3)% 
larger than the a' extracted from the ATLAS analysis. This surprisingly large difference is 
the amount that detector level effects have distorted the m« distribution. A distortion of this 
size in a simple leptonic distribution, which happens to be in the direction of lowering the 
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Figure 2: Comparison of mu distributions of the WW continuum background normalized in 
the 80 < mu < 290 GeV region. Only the ATLAS values include the full detector simulation. 
The two event generators do not include the gg — > WW contribution. Also shown is the 
simulated 125 GeV Higgs signal in the signal region. ATLAS results are obtained from Fig. 

(14b) 0. 

apparent background, deserves further study. 

Table 1 displays some interesting patterns among the showered results. The source of 
the differences is mainly due to the different NLO+shower implementations rather than the 
different generators used for showering. The MC@NLO (and aMC@NLO) values are similar 
to the pure NLO value, while the POWHEG values are larger. In particular when MC@NLO 
and POWHEG are compared with the same parton shower (Herwig6) the a' from POWHEG 
is ~ 9% larger. The Herwig++@NLO implementation yields an even larger a' that is ~ 14% 
larger than the MC@NLO value. This is a significant difference since it is larger than the size 
of the Higgs signal. 

The differences between ATLAS, MC@NLO-Herwig6 and Herwig++@NLO are made clear 
in Fig. (2). The m« distributions have been normalized to have an equal weight in the 
80 < mu < 290 GeV region. The 10 < mu < 50 GeV signal region is indicated. 

We now give some details that could be useful for reproducing our results. At the strict 
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NLO level the WW system recoils against a single extra parton. Thus we could expect that 
replacing the jet veto with a cut p^ w < 25 GeV would produce very similar results. We 
confirm that this is true at the level of showered events by using POWHEG-BOX, which 
implements a FastJet algorithm on its showered events. Thus to avoid problems associated 
with jets in the various analysis routines, for example when events are not showered or when 
jet finding is not available, we use the p^ w cut rather than the jet veto in all cases where 
Delphes is not used. 

MC@NLO, aMC@NLO and Herwig++@NLO are run with their default choices for the 
renormalization scale while the renormalization scale for POWHEG, Sherpa, MCFM and 
VBFNLO is chosen to be the mass of the WW system, mww- Typically we find that the 
choice of renormalization scale has only a mild effect on a' , at least compared to the other 
variations of a' we are finding. 

The cteq611 PDF is used with Sherpa while the CT10 PDF is used in nearly all other cases, 
both at the NLO event generation stage and in the showering stage. The exception is the 
POWHEG-BOX package where CT10 is used for event generation but Pythia6 and Herwig6 
are used with their default PDFs. We have not considered the uncertainties associated with 
PDFs. For the studies involving HepMC event files we turn off the hadronization and multiple 
parton interactions since we find that this has little or no impact on a' (unless we consider 
lepton isolation cuts, see below). Pythia6 and Herwig6 are run with hadronization turned on. 

One might wonder about leptonically decaying r's originating from one or both of the W s. 
We have used Sherpa to check that these configurations by themselves give a a' just slightly 
larger than usual. So this effect on a' is negligible. 

Other than what we have described the various programs are run essentially with their 
default settings, and any further details can be obtained from the author. We mention in 
particular that Pythia8 gives the user easy access to a number of settings for the showering 
of POWHEG events; a sampling of different choices for these settings gives values of a' that 
differ by a few percent. 

We return to the question of the detector level effects, which we have thus far avoided 
in our use of Delphes. To see what a fast detector simulator can say about these effects we 
first turn on lepton isolation cuts as follows. For both electrons and muons the summed pr of 
tracks in a R = 0.3 cone around the lepton (excluding the lepton itself) is required to be less 
than 0.1. For muons the ratio of Et in a 3 x 3 calorimeter array around the muon (including 
the muons cell) to the p T of the muon is required to be less than 0.1. When Delphes is run 
on fully showered and hadronized events with MPI turned on, we find a 5% reduction in the 
value of a' . So this goes a little ways to bridge the gap between our values of a' Q and the 
ATLAS value. 

When we turn the Delphes trigger emulation on we find that this has no influence on a' . 
This is not surprising since the same information is used in Delphes for the trigger emulation 
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as for the final analysis, and so Delphes is blind to the differing resolutions inherent in the 
real triggers. The real triggers could in principle affect ct' Q . 

Another quantity for which a fast detector simulator is over idealized is E™ ms , and so we 
briefly consider the effect of deviations between true and measured E™ 1SS - We can do this in 
the final analysis of the LHCO events, where to each event we add a fake missing energy vector 
of some fixed magnitude and random direction in the transverse plane. This will modify the 
effect of the missing energy cut described above. We find that the effect is to increase ct' by 
a percent or two for 20 GeV of fake missing energy per event. So the answer does not appear 
to lie with missing energy. 

To summarize, the shape of the m« distribution from continuum WW production is crucial 
for estimating the background for the H — > WW search. We find that the effect of merging 
parton showers with NLO event generation can cause greater enhancement of the background 
estimate than caused by the strictly NLO corrections. There is a significant difference in the 
amount of enhancement depending on whether MC@NLO, POWHEG or Herwig++@NLO is 
used. This translates into a significant theoretical uncertainty which may not have been fully 
accounted for in the experimental analyses. Large detector level effects also remain obscure. 

There is no guarantee that the sampling of results from different tools fully reflects the 
true theoretical uncertainty. One could hope for guidance from the measurements of the WW 
cross sections at the LHC While the individual measurements are still consistent with the 
NLO predictions, the central values of the various measurements are systematically higher 
than prediction and so large corrections beyond NLO are still allowed and even hinted atj^] 
We also note that the NLO prediction to which these measurements are compared is MCFM 
with the renormalization scale chosen to be mw, rather than our choice of mww- Lowering 
the renormalization scale to my/ serves to increase the cross section prediction and it also 
increases a' Q slightly. 

Note Added 

After the original version of this paper was posted, ATLAS released an updated H — > WW 
analysis |22j based on 13 fb _1 of y/s = 8 TeV data. 

1) The event generator used for the continuum WW background was changed from MC@NLO- 
Herwig6 to POWHEG-Pythia8. According to our Table 1 this increases the WW background 
estimate in the 0-jet signal region by about 7%Q We can check this by using the same proce- 
dure as described above (now using Table 4 and Fig. (15c) of (22]) and thus extract a = 0.446 
and a' = 0.491 from the new ATLAS analysis]^] These numbers are indeed 7% larger than 

3 I thank Adam Martin for this observation. 

4 This could also be seen in the original version of this paper. 

5 There is a new cut in the 0-jet bin but it produces a tiny effect and we ignore it. 
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the previous numbers. 

2) ATLAS notes that POWHEG-Pythia8 poorly describes the relative number of events in 
the 1-jet and 0-jet control regions. The 1-jet to 0-jet ratio from the data is 0.74 ± 0.08 times 
the POWHEG-Pythia8 prediction. We find that the Herwig++@NLO prediction is 0.71 times 
the POWHEG-Pythia8 prediction and so Herwig++@NLO does much better in this regard. 
At the same time the shapes of the distribution in the 0-jet control and signal regions 
produced by these two event generators are very similar. 

3) We can also consider an extrapolation parameter for the 1-jet bin. According to the 
cuts in [22] we define a[ in the same way as a' Q except that we require one pr > 25 GeV jet 
and we remove the p?fi cutj^] We find that Herwig++@NLO produces a value for a[ which is 
12% larger than the POWHEG-Pythia8 value. 

4) The uncertainty in the extrapolation parameters that ATLAS associates with the choice 
of event generator (pre-shower) is 3.5% while for parton showers and underlying event it is 
4.5%, for both the 0-jet and 1-jet bins (Table 2 of [22]). From our results we conclude that 
these numbers are underestimates of the theoretical uncertainties. 

5) CMS continues [22] to give little information about extrapolation parameters and how 
they are obtained. CMS uses a control region farther from the signal region (m« > 100 
GeV rather than m« > 80 GeV) and this will increase the theoretical uncertainty of the 
extrapolation. The total uncertainty, which presumably includes the renormalization scale 
and PDF uncertainties that ATLAS estimates separately, is given as 10%. This also appears 
to be an underestimate. 
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